Dark Mode
LINEA is an open-source R library aimed at simplifying and accelerating the development of linear models to understand the relationship between two or more variables.
Linear models are commonly used in a variety of contexts including natural and social sciences, and various business applications (e.g. marketing, finance).
This page covers a basic how to setup the linea library to analyse a time-series. We’ll cover:
linea can doThe library can be installed from CRAN using install.packages('linea') or from GitHub using devtools::install_github('paladinic/linea'). Once installed you can check the installation.
print(packageVersion("linea"))
## [1] '0.0.1'
The linea library works well with pipes. Used with dplyr and plotly, it can perform data analysis and visualization with elegant code. Let’s build a quick model to illustrate what linea can do.
We start by importing linea, some other useful libraries, and some data.
# librarise
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization
# fictitious ecommerce data
data_path = 'https://raw.githubusercontent.com/paladinic/data/main/ecomm_data.csv'
# importing flat file
data = read_xcsv(file = data_path)
# adding seasonality and Google trends variables
data = data %>%
get_seasonality(date_col_name = 'date',date_type = 'weekly starting') %>%
gt_f(kw = 'prime day',append = T)
# visualize data
data %>%
datatable(rownames = NULL,
options = list(scrollX = TRUE))
Now lets build a model to understand what drives changes in the ecommerce variable. We can start by selecting a few initial independent variables (i.e. christmas,black.friday,trend,gtrends_prime day)
model = run_model(data = data,
dv = 'ecommerce',
ivs = c('christmas','black.friday','trend','gtrends_prime day'),
id_var = 'date')
summary(model)
##
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -20614 -4528 -439 3130 54641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43718.948 948.974 46.070 < 2e-16 ***
## christmas 300.677 26.410 11.385 < 2e-16 ***
## black.friday 320.211 39.078 8.194 1.21e-14 ***
## trend 129.063 6.118 21.095 < 2e-16 ***
## gtrends_prime day 181.951 42.945 4.237 3.17e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7425 on 256 degrees of freedom
## Multiple R-squared: 0.7498, Adjusted R-squared: 0.7459
## F-statistic: 191.8 on 4 and 256 DF, p-value: < 2.2e-16
Our next steps can be guided by functions like what_next(), which will test all other variables in our data. From the output below, it seems like the variables covid and offline_media would improve the model most.
model %>%
what_next()
## Warning: model object does not contain 'meta_data'.
## # A tibble: 81 x 5
## variable adj_R2 t_stat coef adj_R2_diff
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 offline_media 0.836 11.9 6.44 0.121
## 2 covid 0.815 9.80 192. 0.0922
## 3 year_2020 0.814 9.70 12106. 0.0909
## 4 year_2019 0.780 -6.40 -7130. 0.0461
## 5 christmas_eve 0.777 -6.04 -171037. 0.0415
## 6 week_48 0.770 5.30 21478. 0.0326
## 7 christmas_day 0.768 -5.02 -137137. 0.0294
## 8 week_52 0.765 -4.69 -21248. 0.0259
## 9 promo 0.758 3.68 5.62 0.0159
## 10 year_2017 0.753 2.93 3687. 0.00977
## # ... with 71 more rows
Adding these variables to model brings the adjusted R squared to ~88%.
model = run_model(data = data,
dv = 'ecommerce',
ivs = c('christmas','black.friday','trend','gtrends_prime day','covid','offline_media'),
id_var = 'date')
summary(model)
##
## Call:
## lm(formula = formula, data = trans_data[, c(dv, ivs_t)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -21553.8 -2926.7 -664.8 2609.8 16267.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.786e+04 7.249e+02 66.019 < 2e-16 ***
## christmas 2.811e+02 1.851e+01 15.187 < 2e-16 ***
## black.friday 2.666e+02 2.773e+01 9.613 < 2e-16 ***
## trend 7.908e+01 5.967e+00 13.252 < 2e-16 ***
## gtrends_prime day 1.848e+02 2.977e+01 6.209 2.16e-09 ***
## covid 1.527e+02 1.623e+01 9.407 < 2e-16 ***
## offline_media 5.508e+00 4.757e-01 11.578 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5140 on 254 degrees of freedom
## Multiple R-squared: 0.881, Adjusted R-squared: 0.8782
## F-statistic: 313.5 on 6 and 254 DF, p-value: < 2.2e-16
Now that we have a decent model we can start extracting insights from it. We can start by looking at the contribution of each independent variable over time.
model %>%
decomp_chart()
We can also visualize the relationships between our independent and dependent variables using response curves. From this we can see that, for example, when offline_media is 10, ecommerce increases by ~55.
model %>%
response_curves(x_min = 0)
The Getting Started page is a good place to start learning how to build linear models with linea.
The Advanced Features page shows how to implement the features of linea that allow users to capture non-linear relationships.
The Additional Features page all other functions of the library.
LINEA is being continuously maintained and improved with several features under development.
Here are a few improvements in development:
linea::what_combo()linea::hill_function()Here are a few features on the way:
There are also commercial products being developed: